Search CORE

43 research outputs found

Artificial and natural duplicates in pyrosequencing reads of metagenomic data

Author: Fu Limin
Li Weizhong
Niu Beifang
Sun Shulei
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates. Results We implemented a method for identification of exact and nearly identical duplicates from pyrosequencing reads. This method performs an all-against-all sequence comparison and clusters the duplicates into groups using an algorithm modified from our previous sequence clustering method cd-hit. This method can process a typical dataset in ~10 minutes; it also provides a consensus sequence for each group of duplicates. We applied this method to the underlying raw reads of 39 genomic projects and 10 metagenomic projects that utilized pyrosequencing technique. We compared the occurrences of the duplicates identified by our method and the natural duplicates made by independent simulations. We observed that the duplicates, including both artificial and natural duplicates, make up 4-44% of reads. The number of natural duplicates highly correlates with the samples' read density (number of reads divided by genome size). For high-complexity metagenomic samples lacking dominant species, natural duplicates only make up <1% of all duplicates. But for some other samples like transcriptomic samples, majority of the observed duplicates might be natural duplicates. Conclusions Our method is available from <url>http://cd-hit.org</url> as a downloadable program and a web server. It is important not only to identify the duplicates from metagenomic datasets but also to distinguish whether they are artificial or natural duplicates. We provide a tool to estimate the number of natural duplicates according to user-defined sample types, so users can decide whether to retain or remove duplicates in their projects.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Prognostic Roles of ceRNA Network-Based Signatures in Gastrointestinal Cancers

Author: Bairong Shen
Beifang Niu
Beifang Niu
Jiajia Chen
Xin Qi
Xingqi Chen
Yuanchun Zhao
Publication venue: 'Frontiers Media SA'
Publication date: 01/07/2022
Field of study

Gastrointestinal cancers (GICs) are high-incidence malignant tumors that seriously threaten human health around the world. Their complexity and heterogeneity make the classic staging system insufficient to guide patient management. Recently, competing endogenous RNA (ceRNA) interactions that closely link the function of protein-coding RNAs with that of non-coding RNAs, such as long non-coding RNA (lncRNA) and circular RNA (circRNA), has emerged as a novel molecular mechanism influencing miRNA-mediated gene regulation. Especially, ceRNA networks have proven to be powerful tools for deciphering cancer mechanisms and predicting therapeutic responses at the system level. Moreover, abnormal gene expression is one of the critical breaking events that disturb the stability of ceRNA network, highlighting the role of molecular biomarkers in optimizing cancer management and treatment. Therefore, developing prognostic signatures based on cancer-specific ceRNA network is of great significance for predicting clinical outcome or chemotherapy benefits of GIC patients. We herein introduce the current frontiers of ceRNA crosstalk in relation to their pathological implications and translational potentials in GICs, review the current researches on the prognostic signatures based on lncRNA or circRNA-mediated ceRNA networks in GICs, and highlight the translational implications of ceRNA signatures for GICs management. Furthermore, we summarize the computational approaches for establishing ceRNA network-based prognostic signatures, providing important clues for deciphering GIC biomarkers

Directory of Open Access Journals

CD-HIT Suite: a web server for clustering and comparing biological sequences

Author: Beifang Niu
Letunic
Li
Li
Li
Li
Limin Fu
Suzek
Turnbaugh
Weizhong Li
Ying Gao
Ying Huang
Yooseph
Yooseph
Publication venue: Oxford University Press
Publication date
Field of study

Summary: CD-HIT is a widely used program for clustering and comparing large biological sequence datasets. In order to further assist the CD-HIT users, we significantly improved this program with more functions and better accuracy, scalability and flexibility. Most importantly, we developed a new web server, CD-HIT Suite, for clustering a user-uploaded sequence dataset or comparing it to another dataset at different identity levels. Users can now interactively explore the clusters within web browsers. We also provide downloadable clusters for several public databases (NCBI NR, Swissprot and PDB) at different identity levels

Crossref

PubMed Central

GenomeVIP: A cloud platform for genomic variant discovery and interpretation

Author: Chen Ken
DeNardo Erin
Ding Li
Fenyö David
Handsaker Robert E
Huang Kuan-lin
Koboldt Daniel C
Mashl R. Jay
Niu Beifang
Raphael Benjamin J
Scott Adam D
Wendl Michael C
Wyczalkowski Matthew A
Ye Kai
Yellapantula Venkata D
Yoon Christopher J
Publication venue: Digital Commons@Becker
Publication date: 01/01/2017
Field of study

Digital Commons@Becker

MSIsensor-ct: Microsatellite instability detection using cfDNA sequencing data

Author: Ding Li
Duan Xiaohong
Han Xinyin
He Jiayin
He Xiaoyu
Li Ruilin
Niu Beifang
Wang Dongliang
Wendl Michael C
Yuan Danyang
Zhang Shuying
Zhou Daniel Cui
Publication venue: Digital Commons@Becker
Publication date: 02/09/2021
Field of study

MOTIVATION: Microsatellite instability (MSI) is a promising biomarker for cancer prognosis and chemosensitivity. Techniques are rapidly evolving for the detection of MSI from tumor-normal paired or tumor-only sequencing data. However, tumor tissues are often insufficient, unavailable, or otherwise difficult to procure. Increasing clinical evidence indicates the enormous potential of plasma circulating cell-free DNA (cfNDA) technology as a noninvasive MSI detection approach. RESULTS: We developed MSIsensor-ct, a bioinformatics tool based on a machine learning protocol, dedicated to detecting MSI status using cfDNA sequencing data with a potential stable MSIscore threshold of 20%. Evaluation of MSIsensor-ct on independent testing datasets with various levels of circulating tumor DNA (ctDNA) and sequencing depth showed 100% accuracy within the limit of detection (LOD) of 0.05% ctDNA content. MSIsensor-ct requires only BAM files as input, rendering it user-friendly and readily integrated into next generation sequencing (NGS) analysis pipelines. AVAILABILITY: MSIsensor-ct is freely available at https://github.com/niu-lab/MSIsensor-ct. SUPPLEMENTARY INFORMATION: Supplementary data are available at Briefings in Bioinformatics online

Digital Commons@Becker

Proteogenomic integration reveals therapeutic targets in breast cancer xenografts

Author: Cao Song
Davies Sherri R
Ding Li
Erdmann-Gilmore Petra
et al
Guo Zhanfang
Held Jason M
Hoog Jeremy
Huang Kuan-lin
Li Shunqiang
Ma Cynthia
McLellan Michael D
Niu Beifang
Sanati Souzan
Scott Adam
Snider Jacqueline E
Sun Sam Qiancheng
Townsend R. Reid
Wendl Michael C
Wyczalkowski Matthew A
Ye Kai
Yoon Christopher
Publication venue: Digital Commons@Becker
Publication date: 01/01/2017
Field of study

Digital Commons@Becker

BreakTrans: Uncovering the genomic architecture of gene fusions

Author: Chen Ken
Ding Li
Fan Xian
Hoadley Katherine A
Ley Timothy J
Mardis Elaine R
McLellan Michael D
Navin Nicholas E
Niu Beifang
Perou Charles M
Schmidt Heather K
Wallis John W
Wang Yong
Wilson Richard K
Zhao Hao
Publication venue: Digital Commons@Becker
Publication date: 01/01/2013
Field of study

Producing gene fusions through genomic structural rearrangements is a major mechanism for tumor evolution. Therefore, accurately detecting gene fusions and the originating rearrangements is of great importance for personalized cancer diagnosis and targeted therapy. We present a tool, BreakTrans, that systematically maps predicted gene fusions to structural rearrangements. Thus, BreakTrans not only validates both types of predictions, but also provides mechanistic interpretations. BreakTrans effectively validates known fusions and discovers novel events in a breast cancer cell line. Applying BreakTrans to 43 breast cancer samples in The Cancer Genome Atlas identifies 90 genomically validated gene fusions. BreakTrans is available at http://bioinformatics.mdanderson.org/main/BreakTran

Springer - Publisher Connector

Digital Commons@Becker

PubMed Central

Carolina Digital Repository

Divergent viral presentation among human tumors and adjacent normal tissues

Author: Cao Song
Chen Feng
Chen Ken
Ding Li
Dipersio John F
Gay Hiram
Grubb III Robert
Jayasinghe Reyka
Johnson Kimberly J
Niu Beifang
Rader Janet S
Wendl Michael C
Wu Song
Wyczalkowski Matthew A
Wylie Kristine
Xie Mingchao
Ye Kai
Publication venue: Digital Commons@Becker
Publication date: 01/01/2016
Field of study

We applied a newly developed bioinformatics system called VirusScan to investigate the viral basis of 6,813 human tumors and 559 adjacent normal samples across 23 cancer types and identified 505 virus positive samples with distinctive, organ system- and cancer type-specific distributions. We found that herpes viruses (e.g., subtypes HHV4, HHV5, and HHV6) that are highly prevalent across cancers of the digestive tract showed significantly higher abundances in tumor versus adjacent normal samples, supporting their association with these cancers. We also found three HPV16-positive samples in brain lower grade glioma (LGG). Further, recurrent HBV integration at the KMT2B locus is present in three liver tumors, but absent in their matched adjacent normal samples, indicating that viral integration induced host driver genetic alterations are required on top of viral oncogene expression for initiation and progression of liver hepatocellular carcinoma. Notably, viral integrations were found in many genes, including novel recurrent HPV integrations at PTPN13 in cervical cancer. Finally, we observed a set of HHV4 and HBV variants strongly associated with ethnic groups, likely due to viral sequence evolution under environmental influences. These findings provide important new insights into viral roles of tumor initiation and progression and potential new therapeutic targets

Digital Commons@Becker

PubMed Central

Patterns and functional implications of rare germline variants across 12 cancer types

Large-scale cancer sequencing data enable discovery of rare germline cancer susceptibility variants. Here we systematically analyse 4,034 cases from The Cancer Genome Atlas cancer cases representing 12 cancer types. We find that the frequency of rare germline truncations in 114 cancer-susceptibility-associated genes varies widely, from 4% (acute myeloid leukaemia (AML)) to 19% (ovarian cancer), with a notably high frequency of 11% in stomach cancer. Burden testing identifies 13 cancer genes with significant enrichment of rare truncations, some associated with specific cancers (for example, RAD51C, PALB2 and MSH6 in AML, stomach and endometrial cancers, respectively). Significant, tumour-specific loss of heterozygosity occurs in nine genes (ATM, BAP1, BRCA1/2, BRIP1, FANCM, PALB2 and RAD51C/D). Moreover, our homology-directed repair assay of 68 BRCA1 rare missense variants supports the utility of allelic enrichment analysis for characterizing variants of unknown significance. The scale of this analysis and the somatic-germline integration enable the detection of rare variants that may affect individual susceptibility to tumour development, a critical step toward precision medicine

Digital Commons@Becker

PubMed Central

Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin

Recent genomic analyses of pathologically defined tumor types identify “within-a-tissue” disease subtypes. However, the extent to which genomic signatures are shared across tissues is still unclear. We performed an integrative analysis using five genome-wide platforms and one proteomic platform on 3,527 specimens from 12 cancer types, revealing a unified classification into 11 major subtypes. Five subtypes were nearly identical to their tissue-of-origin counterparts, but several distinct cancer types were found to converge into common subtypes. Lung squamous, head and neck, and a subset of bladder cancers coalesced into one subtype typified by TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes. Of note, bladder cancers split into three pan-cancer subtypes. The multiplatform classification, while correlated with tissue-of-origin, provides independent information for predicting clinical outcomes. All data sets are available for data-mining from a unified resource to support further biological discoveries and insights into novel therapeutic strategie

Carolina Digital Repository